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1 I nformat i on e x trac t i on a n d text segm entation: A u GE AS: a ut ho ri tativene ss gradi n g, Q 
estimation, and sorting 

Ayman Farahat, Geoff Nunberg, Francine Chen 

November 2002 Proceedings of the eleventh international conference on Information 
and knowledge management 

Full text available: ^[ pdf( 1 35.62 KB) Additional Information: full cit ation , abs tra ct , references , c iting s 

When searching for content in in a large heterogeneous document collections like the World 
Wide Web it is not easy to know which documents provide reliable authoritative information 
about a subject. The problem is particularly pointed as it concerns content search for "high- 
value" informational needs such as retrieving medical information, where the cost of error 
may be high. In this paper, a method is described for estimating the authoritativeness of a 
document based on textual, non-topi ... 

2 Link A na lysis: Web pag e s co ring sy stems for horizontal and vertic a l searc h Q 
Michelangelo Diligenti, Marco Gori, Marco Maggini 

May 2002 Proceedings of the eleventh international conference on World Wide Web 

Full text available: |j| pdf(243. 9 2 KB) Additional Information: full .citation, ab str act , references , index term s 

Page ranking is a fundamental step towardsfthe construction of effective search engines for 
both generic (horizontal) and focused (vert/Gal) search. Ranking schemes for horizontal 
search like the PageRank algorithm used by&oogle operate on the topology of the graph, 
regardless of the page content. On the otherf hand, the recent development of vertical 
portals (vortals) makes it useful to adopt scoring systems focussed on the topic and taking 
the page content into account. In ... 



Keywords: Focused PageRank, HITS, PageRank, random walks, web page scoring systems 



3 Li n k anal ysis: Li nk fus i on : a unified link a nal ys i s fra m e wo r k for multi- type i nte rr e l ate d Q 
data ob je c t s 

Wensi Xi, Benyu Zhang, Zheng Chen, Yizhou Lu, Shuicheng Yan, Wei-Ying Ma, Edward Allan 
Fox 

May 2004 Proceedings of the 13th conference on World Wide Web 

Full text available: ^ pdf(510.05 KB) Additional Information: fu l l citat ion, a bstra ct, refere nces , in dex terms 

Web link analysis has proven to be a significant enhancement for quality based web search. 
Most existing links can be classified into two categories: intra-type links (e.g., web 
hyperlinks), which represent the relationship of data objects within a homogeneous data 
type (web pages), and inter-type links (e.g., user browsing log) which represent the 
relationship of data objects across different data types (users and web pages). 
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Unfortunately, most link analysis research only considers one type of ... 
Keywords: data fusion, information retrieval, link analysis algorithms, link fusion 



Lo cal versu s g lobal link info rm ation i n the We b 

Pavel Calado, Berthier Ribeiro-Neto, Nivio Ziviani, Edleno Moura, Ilmerio Silva 
January 2003 ACM Transactions on Information Systems (TOIS), volume 21 issue 1 

Full text available- HI odf(41 3 06 KB) Additional Information: full citation, abMract, references, clings, i ndex 
. , - terms 

Information derived from the cross-references among the documents in a hyperlinked 
environment, usually referred to as link information, is considered important since it can be 
used to effectively improve document retrieval. Depending on the retrieval strategy, link 
information can be local or global. Local link information is derived from the set of 
documents returned as answers to the current user query. Global link information is derived 
from all the documents in the collection. In th ... 

Keywords: Belief networks, World Wide Web, link analysis, local and global information 



5 Stable algorithms for link analysis Q 
Andrew Y. Ng, Alice X. Zheng, Michael I. Jordan 

September 2001 Proceedings of the 24th annual international ACM SZGIR conference on 
Research and development in information retrieval 

Full text available* 111 p.df(208 24 KB) Additional Information: fu l l citation , abst r a ct, references, citi ngs, i ndex 
' ™ te rm s 

The Kleinberg HITS and the Google PageRank algorithms are eigenvector methods for 
identifying * * authoritative" or x * influential" articles, given hyperlink or citation information. 
That such algorithms should give reliable or consistent answers is surely a desideratum, and 
in~\cite{ijcaiPaper}, we analyzed when they can be expected to give stable rankings under 
small perturbations to the linkage patterns. In this paper, we extend the analysis and show 
how it gives insight into ways of de ... 

6 Web Info rmatio n Retrieva l : The Import ance of Pr ior Pro bab i lities for En try Page Search Q 
Wessel Kraaij, Thijs Westerveld, Djoerd Hiemstra 

August 2002 Proceedings of the 25th annual international ACM SIGIR conference on 
Research and development in information retrieval 

Full text available* fjQ pdf{.1 35 87 KB) Additional Information: full citation , abstract, refe r e nces, citings, i ndex 
' ™ " s terms 

An important class of searches on the world-wide-web has the goal to find an entry page 
(homepage) of an organisation. Entry page search is quite different from Ad Hoc search. 
Indeed a plain Ad Hoc system performs disappointingly. We explored three non-content 
features of web pages: page length, number of incoming links and URL form. Especially the 
URL form proved to be a good predictor. Using URL form priors we found over 70% of all 
entry pages at rank 1, and up to 89% in the top 10. Non-conten ... 

Keywords: URLs, entry page search, language models, links, parameter estimation, prior 
probabilities 



7 Infor mation re tr ieval s es s ion 7: web: Represe n t i n g in te rests as a hyperlinke d do cum ent Q 
co l lect ion 

Michelle Fisher, Richard Everson 

November 2003 Proceedings of the twelfth international conference on Information and 
knowledge management 

Full text available: *Q pdf( 111.85 KB) Additional Information: full citation, abstr act, re ferenc es, ind ex t erm s 



http://portal.acm.org/res^ 7/27/04 



Results (page 1): "authoritative document" 



Page 3 of 5 



We describe a latent variable model for representing a user's interests as a hyperlinked 
document collection. By collecting hyper-text documents that a user views, creates or 
updates whilst at their computer, we are able to use not only the content of these 
documents but also the inter-connectivity of the collection to model the user's interests. The 
model uses Probabilistic Latent Semantic Analysis and Probabilistic Hypertext Induced Topic 
Selection and decomposes the user's document collection ... 

Keywords: hyperlinked/hypertext document collections, information access, latent variable 
models, user interests 



Form al specification fo r a clinica l c yclotron con tro l s ystem 
Jonathan Jacky 

April 1990 ACM SIGSOFT Software Engineering Notes , Conference proceedings on 

Formal methods in software development, volume is issue 4 
Full text available: 111 pdf(1.15 MB) Additional Information: full citation , references , index term s 



9 A s urvey of Web metrics 

Devanshu Dhyani, Wee Keong Ng, Sourav S. Bhowmick 
December 2002 ACM Computing Surveys (CSUR), volume 34 issue 4 

Full text available:^ pdf(289.28 KB) Additional Information: full citation , abstract , references , index te rms 

The unabated growth and increasing significance of the World Wide Web has resulted in a 
flurry of research activity to improve its capacity for serving information more effectively. 
But at the heart of these efforts lie implicit assumptions about "quality" and "usefulness" of 
Web resources and services. This observation points towards measurements and models 
that quantify various attributes of web sites. The science of measuring all aspects of 
information, especially its storage and retrieval or ... 

Keywords: Information theoretic, PageRank, Web graph, Web metrics, Web page similarity, 
quality metrics 
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10 Does "author it y" mean qua l i ty ? p redict ing expert q ua lity rating s of Web do cu m ent s Q 
Brian Amento, Loren Terveen, Will Hill 

July 2000 Proceedings of the 23rd annual international ACM SIGIR conference on 
Research and development in information retrieval 

Full text available* 1gg pdf(773 39 KB) Additional Information: full citation, attract, references, citings, index 
^ " " terms 

For many topics, the World Wide Web contains hundreds or thousands of relevant 
documents of widely varying quality. Users face a daunting challenge in identifying a small 
subset of documents worthy of their attention. 



Link analysis algorithms have received much interest recently, in large part for their 
potential to identify high quality items. We report here on an experimental evaluation of this 
potential. 

We evaluated a number of link and content-based algorithms using a dat ... 
Keywords: exploiting hyperlink structure 



11 Authorit ativ e sources in a h yperlink ed env i ro n men t Q 
Jon M. Kleinberg 

January 1998 Proceedings of the ninth annual ACM-SIAM symposium on Discrete 
algorithms 
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1 2 Structure of mathematica l pro gram m ing s yst ems Q 
WM. Orchard Hays 

January 1968 Proceedings of the 1968 23rd ACM national conference 

Full text available: ^pdf(1. 4 7 MB) Additional Information: fu ll c itation, abst r act , in dex te rms 

A mathematical programming system (MPS), as now implemented on third generation 
computers, constitutes four separate subject areas: 1. Algorithmic and procedural 
capabilities 2. Problem formulation and solution techniques 3. Programming languages 4. 
System structure and use Each of these areas involves extensive considerations and we can 
not do justice to any of them in the time available. Since problem formulation and solution 
techniqu ... 



13 L in k- b a sed r a nk i n g 2: S ea rc h ing the workplace web Q 
Ronald Fagin, Ravi Kumar, Kevin S. McCurley, Jasmine Novak, D. Sivakumar, John A. Tomlin, 
David P. Williamson 

May 2003 Proceedings of the twelfth international conference on World Wide Web 

Full text available: ^ |pdf(231.55 KB) Additional Information: full citation , abstract , references , index terms 

The social impact from the World Wide Web cannot be underestimated, but technologies 
used to build the Web are also revolutionizing the sharing of business and government 
information within intranets. In many ways the lessons learned from the Internet carry over 
directly to intranets, but others do not apply. In particular, the social forces that guide the 
development of intranets are quite different, and the determination of a "good answer" for 
intranet search is quite different than on the Int ... 

14 Document Databases: Requirements for XML document database systems Q 
Airi Salminen, Frank Wm. Tompa 

November 2001 Proceedings of the 2001 ACM Symposium on Document engineering 

Full text available* ffl| pdf(141 89 KB) Additional Information: full citation , abstract , references , citings , index 
. = t erm s 

The shift from SGML to XML has created new demands for managing structured documents. 
Many XML documents will be transient representations for the purpose of data exchange 
between different types of applications, but there will also be a need for effective means to 
manage persistent XML data as a database. In this paper we explore requirements for an 
XML database management system. The purpose of the paper is not to suggest a single type 
of system covering all necessary features. Instead the pur ... 

Keywords: XML, XML database systems, data definition, data manipulation, data modelling, 
structured documents 



15 Finding autho r ities and hubs fr om l i nk structures on t he Wo rld Wide We b 
Allan Borodin, Gareth O. Roberts, Jeffrey S. Rosenthal, Panayiotis Tsaparas 
April 2001 Proceedings of the tenth international conference on World Wide Web 

Full text available: t ||| pdf(269.01 K B) Additional Information: ful l cit ation, references , citi n gs, i nde x terms 



Keywords: Bayesian, Weinberg's algorithm, SALSA, authorities, hubs, link analysis, 
threshold, web searching 
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17 Learners as authors: help in g ESL employees in a Canadian bank prepare customer 
relations and docume n tation mater i a l 
Paul Beam, Diane Burke 

October 1994 Proceedings of the 12th annual international conference on Systems 
documentation: technical communications at the great divide 
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